1 Executive Summary

  • The aim of this report is to …
  • The main discoveries are …


2 Full Report

2.1 Initial Data Analysis (IDA)

The data set used to generate our research questions comes from the Kiva Crowdfunding, ‘Data science for good’ open data initiative. Kiva (2023) is a service aimed at providing small loans to the worlds unbanked population. This initiative was created so members of the public could help Kiva better understand the levels of poverty in areas where they had active loans (Kaggle, 2018). The data has a CC0: Public Domain licence meaning that we are free to use and distribute the data as we wish (Creative Commons, 2023) The data has been edited by a community member called ‘mfab’ which may reduce the reliability of the data, however as Kiva is the ‘Owner’ it is assumed that they have approved this editor and that the data remains reliable. The omission of the COVID-19 pandemic from the data is a notable limitation as the pandemic may have created changes in trends which would have made for interesting research.

The world_gdp data is reliable as it is collected and consolidated by the world bank, which is a global organisation run by the United Nations. It has a CC-BY 4.0 license allowing users to copy, modify and distribute data in any format for any purpose (World Bank, 2021).

(((The countries dataset from kaggle is not as reliable as an independent user uploaded the data. Some other kaggle users have expressed concern about the data’s reliability. Since only the country name and region data has been used, we believe that it is suitable for our uses in this report.))) - Feel free to cut out if too long

Some wrangling of data was required in order to use the datasets. Most wrangling of data was grouping to isolate variables for comparison. Functions such as mutate, rename and merge were also used to change variable values, rename columns and merge datasets together for the generation of our figures.

Potential stakeholders for this report would be those looking to potentially loan out money to others through services like Kiva, that want to know where their money is being sent and how it is being used. This report is a general overview about the types of people that apply for the loans, how much their loans are and how the loaned money is used.

2.2 Data setup

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0     ✔ purrr   0.3.4
## ✔ tibble  3.1.7     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.1     ✔ stringr 1.4.0
## ✔ readr   2.1.3     ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(tmap)
library(countrycode)
library(janitor)
## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(plotly)
## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout
kiva_loans <- read_csv("data/kiva_loans.csv") # read in Kiva data
## Rows: 671205 Columns: 20
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (10): activity, sector, use, country_code, country, region, currency, t...
## dbl   (6): id, funded_amount, loan_amount, partner_id, term_in_months, lende...
## dttm  (3): posted_time, disbursed_time, funded_time
## date  (1): date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Let’s create our own country dataset.

Sources: - https://www.kaggle.com/datasets/juanumusic/countries-iso-codes - https://www.kaggle.com/datasets/fernandol/countries-of-the-world

NEED TO ELABORATE ON THESE SOURCES…

# Creating our own 'Countries' dataset from two other datasets
countries <- read_csv("data/countries.csv")
## Rows: 227 Columns: 20
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (11): Country, Region, Pop. Density (per sq. mi.), Coastline (coast/area...
## dbl  (3): Population, Area (sq. mi.), GDP ($ per capita)
## num  (6): Infant mortality (per 1000 births), Literacy (%), Other (%), Clima...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
countries_clean <- countries %>%
  clean_names() %>%
  subset(select = c("country", "region"))

iso_codes <- read_csv("data/iso_codes.csv") %>%
  rename(name = `English short name lower case`) %>%
  rename(country_code = `Alpha-2 code`) %>%
  subset(select = c("name", "country_code")) # only need the 3-digit code
## Rows: 246 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): English short name lower case, Alpha-2 code, Alpha-3 code, Numeric ...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
iso_countries <- merge(countries_clean, iso_codes, by.x = "country", by.y = "name")
head(iso_countries)
##          country               region country_code
## 1    Afghanistan ASIA (EX. NEAR EAST)           AF
## 2        Albania       EASTERN EUROPE           AL
## 3        Algeria      NORTHERN AFRICA           DZ
## 4 American Samoa              OCEANIA           AS
## 5        Andorra       WESTERN EUROPE           AD
## 6         Angola   SUB-SAHARAN AFRICA           AO
# Merging Countries dataset with Kiva loans

kiva_countries <- merge(kiva_loans, iso_countries, by.x = "country_code", by.y = "country_code")

2.3 Research Question 1

##Is there are correlation between a country’s GDP per capita and its total loan sum?

To Answer this question an interactive scatter plot was drawn using the below code. Hovering above a data point will bring up the country’s name, its GDP per capita and its total loan sum. Both axis are logarithmic with a base of 10 in order to spread out the data so relationships can be drawn.

world_gdp <- read_csv("data/world_gdp.csv", skip = 4) # reads the file "world_gdp.csv" and assigns it as "world_gdp"

gdp_cleaned <- world_gdp %>% 
  rename(country = "Country Name") %>% # renames country
  mutate(country = recode(country, `Congo, Rep.` = "Congo", `Congo, Dem. Rep.` = "The Democratic Republic of the Congo", `Cote d'Ivoire` = "Cote D'Ivoire", `Egypt, Arab Rep.` = "Egypt", `Kyrgyz Republic` = "Kyrgyzstan", `Lao PDR` = "Lao People's Democratic Republic", `Myanmar` = "Myanmar (Burma)", `West Bank and Gaza` = "Palestine", `St. Vincent and the Grenadines` = "Saint Vincent and the Grenadines", `Turks and Caicos Islands` = "Turkey", `Virgin Islands (U.S.)` = "Virgin Islands", `Yemen, Rep.` = "Yemen")) %>% # renames country names that differ from the kiva data set %>% 
  group_by(country) %>% # aggregates by country
  rename(country_gdp = "2015") %>% # renames country GDP
  summarise(country_gdp) # creates a new data frame

kiva_gdp <- kiva_loans %>%
  group_by(country) %>%
  summarise(sum(loan_amount))

kiva_gdp <- merge(kiva_gdp, gdp_cleaned, all.x = T, all.y = F) # merges the two data frames

colnames(kiva_gdp) <- c("Country", "Loan Sum", "GDP per Capita")

plot_kiva_gdp <- ggplot(kiva_gdp, aes(x = `GDP per Capita`, y = `Loan Sum`, country = `Country`)) +
  geom_point(colour  = "magenta") + 
  scale_x_continuous(trans = 'log10') +
  scale_y_continuous(trans = 'log10') +
  labs(x = "GDP pe capita (USD)", y = "Sum of Loans (USD)", title = "The total sum of Kiva loans against the GDP per capita for each country")
ggplotly(plot_kiva_gdp)

The GDP data in the scatter plot above utilities the GDP data from 2015. This was done to make a better comparison with the kiva loan data which has data from 2014 to 2017. The plot has clustering in the top left, which implies that countries with a lower GDP per capita account for the majority of loaned money. There is however no direct linear correlation between the total loaned amount and GDP per capita. It is important to note that the countries that with the highest total loan sum tend to be developing countries with a relatively low GDP per capita. Agriculture tends to be the largest sector in terms of their shares in GDP and employment for these sorts of countries (https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=7d15e1f8e20c97dd4eeeccd2ad3f5e3e4255b831#:~:text=Agricultural%20Spending,shares%20in%20GDP%20and%20employment.). Which is interesting to note as it leads into the next research question which looks into how the funding is distributed per sector.

2.4 Research Question 2

What is the distribution of funding between the Kiva designated sectors?

 kiva_loans %>%
  group_by(sector) %>%
  summarise(total_funding = signif(sum(funded_amount)*10^(-6),5)) %>%
  ggplot(aes(x=fct_reorder(sector, desc(total_funding)), y = total_funding)) + geom_col() +  # Represents the data as a side by side bar chart with colour
  labs(x = "Sector", y = "Total Funding (Million USD)", title = 'Total Kiva funding per Sector') +
  theme_classic() + #Classic theme just an example or an idea of what can be done and how
  geom_text(aes(label = total_funding), vjust = -0.5, size = 2) +
  theme(axis.text.x = element_text(angle = 60, vjust = 0.5, hjust=0.4))

Reviewing the above bar plot, we can see that agriculture, food and retail are by far the most funded Kiva sectors. For context, these three sectors receive double the amount of funding as the remaining 12. Interestingly, if we review the ‘Use’ column of the data frame it can also be observed that many loans categorised for retail are in fact loans to purchase food items such as Salt, Rice, Flour etc.

library(gt)
kiva_loans %>%
  filter(sector == 'Food') %>%
  select(use) %>%
  slice(1:10) %>% 
  gt()
use
To buy seasonal, fresh fruits to sell.
to purchase one buffalo.
to buy a stall, gram flour, ketchup, and coal for selling ladoo.
to buy ingredients to make bakery products.
to purchase vegetables, chicken, and oil to cook food to sell.
to purchase a variety of needed food items to prepare food to sell.
to purchase one cow.
to purchase a new, bigger-size cart.
to purchase sacks of tomatoes, potatoes, fruits, and green vegetables for resale
to buy meat and also to start selling fish in his butcher shop.

The conclusions drawn from research Q1 inform us that Kiva loans are most typically requested in developing countries with low GDP’s, in these countries access to food is not a given and the creation of a constant food supply may be able to lift some out of poverty. In his 2015 report, Robert Townsend states that for the worlds poorest, growth in agriculture is two to four times more effective in raising living standards than growth in the next closest sector. This fact can create a win-win scenario for all stakeholders. So long as the loan is used wisely the fundees can create more value than they initially borrowed, and funders can see their investment amount to more than its dollar value. As such, it is not surprising that agriculture loans make up 27% of all Kiva loans.

This will need to be formatted properly

Townsend, R. (2015). Ending poverty and hunger by 2030 : an agenda for the global food system. Washington DC. Retrieved from https://documents.worldbank.org/en/publication/documents-reports/documentdetail/700061468334490682/ending-poverty-and-hunger-by-2030-an-agenda-for-the-global-food-system.

2.5 Research Question 3

Insert text and analysis.

# Grouping agriculture loans by region
agriculture_by_region <- kiva_countries %>% 
  group_by(region = region.y) %>%
  summarise(total_loans_sum = sum(loan_amount))

Summary:

ggplot(agriculture_by_region, aes(x = region, y = total_loans_sum)) +
  geom_col(stat = "identity") +
  theme(axis.text.x = element_text(angle=90),
          axis.text.y = element_text(angle=90))
## Warning in geom_col(stat = "identity"): Ignoring unknown parameters: `stat`


2.6 Research Question 4

What Is The Average Loan Amounts Per Gender in Each Region?

## Filtering out data which does not have either Male or Females in borrower_genders column. 

genders_clean <- kiva_loans %>% 

  filter(!is.na(borrower_genders) & borrower_genders %in% c("male", "female")) 

  

## Finding the mean funded amount for genders dependent on country 

summary_aggregated <- genders_clean %>% 

  group_by(country, borrower_genders) %>% 

  summarize(mean_funded_amount = mean(funded_amount)) 

   

## Making columns and separating data 

summary_aggregated <- summary_aggregated %>% 

  mutate(males = ifelse(borrower_genders == "male", mean_funded_amount, 0), females = ifelse(borrower_genders == "female", mean_funded_amount, 0)) 

  

## Synthesizing data 

synthesized_data <- summary_aggregated %>% 

  group_by(country) %>% 

  summarize(male = sum(males), female = sum(females)) 

  

## Renaming a column 

synthesized_data <- rename(synthesized_data, c("Country" = "country")) 

countries_clean <- rename(countries_clean, c("Country" = "country")) 

  

## Combining male and female data with countries data 

countries_clean <- inner_join(countries_clean, synthesized_data, by = "Country") 

  

## Regions Only 

regions_data <- countries_clean %>% 

  group_by(region) %>% 

  summarize(male = sum(male), female = sum(female)) 

  

## Making it look nice 

regions_data <- regions_data %>% 

  gather(key = "gender", value = "value", male, female) 

  

## Plotting Side by Side Graph 

ggplotly(ggplot(data = regions_data, aes(x = region, y = value, fill = gender)) + 

           geom_bar(stat = "identity", position = "dodge") + 

           labs(x = "\nRegions", y = "Mean Loan Amount\n", title = "\n Mean Loan Amount Per Gender in Regions\n") + 

           theme(plot.title = element_text(hjust = 0.5), 

                 axis.title.x = element_text(face = "bold", colour = "red", size = 10), 

                 axis.title.y = element_text(face = "bold", colour = "red", size = 10)) + 

                 theme(axis.text.x = element_text(angle = 45, hjust = 1)))